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Abstract 

Motivated by video coding applications, the problem of sequential coding of correlated sources with encoding 
and/or decoding frame-delays is studied. The fundamental tradeoffs between individual frame rates, individual frame 
distortions, and encoding/decoding frame-delays are derived in terms of a single-letter information-theoretic char- 
acterization of the rate-distortion region for general inter-frame source correlations and certain types of potentially 
frame specific and coupled single-letter fidelity criteria. The sum-rate-distortion region is characterized in terms of 
generalized directed information measures highlighting their role in delayed sequential source coding problems. For 
video sources which are spatially stationary memoryless and temporally Gauss-Markov, MSE frame distortions, and 
a sum-rate constraint, our results expose the optimality of idealized differential predictive coding among all causal 
sequential coders, when the encoder uses a positive rate to describe each frame. Somewhat surprisingly, causal 
sequential encoding with one-frame-delayed noncausal sequential decoding can exactly match the sum-rate-MSE 
performance of joint coding for all nontrivial MSE-tuples satisfying certain positive semi-definiteness conditions. 
Thus, even a single frame-delay holds potential for yielding significant performance improvements. Generalizations 
to higher order Markov sources are also presented and discussed. A rate-distortion performance equivalence between, 
causal sequential encoding with delayed noncausal sequential decoding, and, delayed noncausal sequential encoding 
with causal sequential decoding, is also established. 

Index Terms 

Differential predictive coded modulation, directed information, Gauss-Markov sources, mean squared error, rate- 
distortion theory, sequential coding, source coding, successive refinement coding, sum-rate, vector quantization, video 
coding. 

I. Introduction 

Differential predictive coded modulation (DPCM) is a popular and well-established sequential predictive source 
compression method with a long history of development (see [l]-[8] and the references therein). DPCM has had 
wide impact on the evolution of compression standards for speech, image, audio, and video coding. The classical 
DPCM system consists of a causal sequential predictive encoder and a causal sequential decoder. This is aligned with 
applications having low delay tolerance at both encoder and decoder. However, there are many interesting scenarios 
where these constraints can be relaxed. There are three additional sequential source coding systems possible when 

'This material is based upon work supported by the US National Science Foundation (NSF) under award (CAREER) CCF-0546598. Any 
opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the 
views of the NSF. Parts of this work were presented at ITA'07 and ISIT'07. 
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limited delays are allowed at the encoder and/or the decoder: (i) causal (C) encoder and noncausal (NC) decoder; 

(ii) NC-encoder and C-decoder; and (iii) NC-encoder and NC-decoder. Application examples of these include, 
respectively, non-real-time display of live video for C-NC, zero-delay display of non-real-time encoded video for 
NC-C, and non-real-time display of non-real-time video for NC-NC (see Figs. Q] 12 [3] and [7j. Of special interest, 
for performance comparison, is joint coding (JC) which may be interpreted as an extreme special case of the C-NC, 
NC-C, and the NC-NC systems where all frames are jointly processed and jointly reconstructed (Fig. 0c)). 

The goal of this work is to provide a computable (single-letter) characterization of the fundamental information- 
theoretic rate-distortion performance limits for the different scenarios and to quantify and compare the potential value 
of systems with limited encoding and decoding delays in different rate-distortion regimes. The primary motivational 
application of our study is video coding (see Section Hl-Bb with encoding and decoding frame delays^) 

To characterize the fundamental tradeoffs between individual frame-rates, individual expected frame-distortions, 
encoding and decoding frame-delays, and source inter-frame correlation, we build upon the information-theoretic 
framework of sequential coding of correlated sources. This mathematical framework was first introduced in [9] (and 
independently studied in [10], [11] under a stochastic control framework involving dynamic programming) within 
the context of the purely C-(J^ (i.e., without frame-delays), sequential source coding system. As noted in [9], the 
results for the well-known successive-refinement source coding problem (see [12]— [14]) can be derived from those 
for the C-C sequential source coding problem by setting all sources to be identically equal to the same source. The 
complete (single-letter) rate-distortion region for two sources (with a remark regarding generalization to multiple 
sources) and certain types of perceptually-motivated coupled single-letter distortion criteria were derived in [9]. Our 
results cover not only the two-frame C-C problem studied in [9] but also the C-NC, the NC-C, the NC-NC, and 
the JC cases for arbitrary number of sources and for general coupled single-letter distortion criteria. We have also 
been able to simplify some of the key derivations in [9] (the C-C case). 

The benefits of decoding delay on the rate versus MSE performance was investigated in [5], where the video 
was modeled as a Gaussian process which is spatially independent and temporally first-order-autoregressive. An 
idealized DPCM structure was imposed on both the encoder and the decoder. In contrast to conventional rate- 
distortion studies of scalar DPCM systems based on scalar quantization and high-rate asymptotics (see [l]-[3] and 
references therein), [5] studied DPCM systems with vector-valued sources and large spatial (as opposed to high 
rate) asymptotics similar in spirit to [9]— [ 11] but with decoding frame-delays. The main findings of [5] were that 
(i) NC-decoders offer a significant relative improvement in the MSE at medium to low rates for video sources with 
strong temporal correlation, (ii) most of this improvement can be attained with a modest decoding frame-delay, and 

(iii) the gains vanish at very high and very low rates. 

In contrast to the insistence on DPCM encoders and decoders in [5], here we consider arbitrary rate-constrained 
coding structures. When specialized to spatially stationary memoryless, temporally Gauss-Markov video sources, 
with MSE as the fidelity metric and a sum-rate constraint, our results reveal the information-theoretic optimality of 
idealized DPCM encoders and decoders for the C-C sequential coding system (Corollary 1.3). A second, somewhat 
surprising, finding is that for fc-th order Gauss-Markov video sources with a sum-rate constraint, a C-encoder with 
a £-frame-delayed NC-decoder can exactly match the sum-rate-MSE performance of the joint coding system which 

2 Accordingly, terms like frame-delay and "causal" and "noncausal" encoding and/or decoding should be interpreted within this application 
context. 

3 The terminology is ours. 
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can wait to collect all frames of the video segment before jointly processing and jointly reconstructing themj 
(Corollary 5.2). Interestingly, this performance equivalence does not hold for all MSE-tuples. It holds for a non- 
trivial subset which satisfies certain positive semi-definiteness conditions. The performance-matching region expands 
with increasing frame-delays allowed at the decoder until it completely coincides with the set of all reachable tuples 
of the JC system. A similar phenomenon holds for Bernoulli-Markov sources with a Hamming distortion metric. 
Thus, the benefit of even a single frame-delay can be significant. These two specific architectural results constitute 
the main contributions of this work. 

For clarity of exposition, the proofs of achievability and converse coding theorems in this paper are limited to 
discrete, memoryless, {spatially) stationary (DMS) correlated sources taking values in finite alphabets and bounded 
(but coupled) single-letter fidelity criteria. Analogous results can be established for continuous alphabets (e.g., 
Gaussian sources) and unbounded distortion criteria (e.g., MSE) using the techniques in [15] but are not discussed 
here. 

The rest of this paper is organized as follows. Delayed sequential coding systems and their associated operational 
rate-distortion regions are formulated in Section [TTJ To preserve the underlying intuition and flow of ideas, we first 
focus on 3-stage coding systems and then present natural extensions to general T-stage coding systems. Coding 
theorems and associated implications for the C-C, JC, C-NC, and NC-C systems are presented in Sections III, IV, 
V and VI respectively. Results for T-stage C-NC and NC-NC systems are presented in Sections VII and VIII. A 
detailed proof of achievability and converse coding theorems is presented only for the C-NC system with T — 3 
frames. The achievability and converse results for other delayed coding systems are similar but lengthy, repetitive, 
and cumbersome, and are therefore omitted. We conclude in Section IX. 

Notation: The nonnegative cone of real numbers is denoted by R + and 'iid' denotes independent and identically 
distributed. Vectors are denoted in boldface (e.g., x, X). The dimension of the vector will be clear from the context. 
With the exception of T denoting the size of a group of pictures (GOP) in a video segment and R denoting 
a rate, random quantities are denoted in upper case (e.g., X, X), and their specific instantiations in lower case 
(e.g., X = x, X = x). When A denotes a random variable, A" denotes the ordered tuple (A\, . . . , A„), A" n denotes 
(A m , . . . ,A„), and A(z-) denotes (A(l), . . ., A(i - 1)). However, for a set M" denotes the n-fold Cartesian product 
y\ x . . . x g{. For a function g(a), g"(a(l), . . . , a(n)) denotes the samplewise function (g(a(l)), . . . , g(a(n))). 

II. Problem formulation 
A. Statistical model for T correlated sources 

T correlated DMSs taking values in finite alphabets are defined by 

(X l (i),...,X T (i))l 1 e(.XiX...xX T T, 

\Xj\<oo, v/ = i,...,r. 

The joint probability distribution of sources is given by 

for i= l,...,n, (Xi(j), ...,X T (i)) ~ iid p Xl ...x T (x\, ■ ■ ■ , x T ). 

Potentially, the (spatially) iid assumption can be relaxed to spatially stationary ergodic by a general AEP argument, 
but is not treated in this paper. Of interest are the large-n asymptotics of achievable rate and distortion tuples. 

4 This is similar to the coding of correlated parallel vector Gaussian sources but with an individual MSE constraint on each source component. 
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B. Video coding application context 

In Fig. [1] Xi,...,Xr represent T video frames with Xj = (Xj(i))"_ v j = \,...,T. Here, i denotes discrete index 
of the spatial location of a picture element (pixel) relative to a certain spatial scan order (e.g., zig-zag or raster 
scan), and Xj(i) denotes discrete pixel intensity level at spatial location i in frame number j. Instead of being 
available simultaneously for encoding, initially, only (Xi{i))" =l is available, then (X2{i))" =l "arrives", followed by 
(XiQ))*,, and so on. This temporal structure captures the order in which the frames are processed. The statistical 
structure assumed in Section II. A above implies that the sources are spatially independent but temporally dependent. 




Fig. 1. Illustrating motion-compensated video coding for T = 3 frames. 



While this is rarely an accurate statistical model for the unprocessed frames of a video segment in a scene 
(usually corresponding to the GOP in video coding standards), it is a reasonable approximation for the evolution 
of the video innovations process along optical-flow motion trajectories for groups of adjacent pixels (see [5] and 
references therein). This model assumes arbitrary temporal correlation but iid spatial correlation. The statistical law 
Pxi...x T i s assumed to be known here. In practice, this may be learnt from pre-operational training using clips from 
video databases used by video-codec standardization groups such as H.26x and MPEG-x which is quite similar in 
spirit to the offline optimization of quantizer tables in commercial video codecs. Single-letter information-theoretic 
coding results need asymptotics along some problem dimension to exploit some version of the law of large numbers. 
Here, the asymptotics are in the spatial dimension and is matched to video coding applications where it is quite 
typical to have frames of size n = 352 x 288 pixels at 30 frames per second (full CIIO- It is also fairly common 
to code video in groups of T = 15 pictures. 



C. Delayed sequential coding systems 

For clarity of exposition, we start the discussion with the exemplary case of three frame systems. Systems with 
an arbitrary number of frames are studied in sections IVIII and IVIIII 

• C-C systems: The causal (zero-delay) sequential encoding with (zero-delay) causal sequential decoding system 
is illustrated in Fig. [2] In the first stage, the video encoder can only access Xi and encodes it at rate Ri so that 
the video decoder is able to reconstruct Xi as Xj immediately. In the second stage (after one frame-delay), the 

5 CIF stands for Common Intermediate Format. Progressively scanned HDTV is typically n = 1280 X 720 ss one million pixels at 60 frames 
per second. 
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encoder has access to both Xj and X2 and encodes them at rate R 2 so that the decoder can produce X2 with help 
from the encoder's message in the first stage. In the final stage, the encoder has access to all the three sources and 
encodes them at rate R 3 and the decoder produces X3 with help from the encoder's messages from all the previous 
stages. Note that the processing of information by the video encoder and video decoder in different stages can be 
conceptually regarded as distinct source encoders and source decoders respectively. Also note that it is assumed 
that both the encoder and the decoder have enough memory to store all previous frames and messages. 



Video Frames 



(Xi(l)...Xi(n))- 



space 

(X 2 (\)...X 2 (n))- 

(X 3 (l)...X 3 (n))- 
time 



End 



Rates 



Enc.2 



Enc.3 



+ 



-h 



Dec.1 
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^(x 2 (l)...X 2 (nj) 
-+(x 3 (l)...%(n)) 



Fig. 2. C-C: Causal (zero-delay) sequential encoding with causal sequential decoding. Sum-rate = /?£„f = R[ + R2 + R3 



• C-NC systems: The causal sequential encoding with one-stage delayed noncausal sequential decoding system is 
illustrated in Fig. |3a). In the figure, all the encoders have access to the same sets of sources as in the C-C system 
shown in Fig. [2] However, the decoders are delayed (moved downwards) by one stage with respect to Fig. [2] 
Specifically, the first decoder observes the messages from the first two encoders to produce Xj. The second decoder 
produces X2 based on all the three messages from the three encoders. The third decoder also produces X3 using 
all the messages. 

• NC-C systems: The one-stage delayed noncausal sequential encoding with causal sequential decoding system is 
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Fig. 3. (a) C-NC: Causal sequential encoding with one-stage delayed noncausal sequential decoding; (b) NC-C: one-stage delayed 
noncausal sequential encoding with causal sequential decoding; (c) JC: (T - i)-stage delayed joint (noncausal) encoding with joint 
(noncausal) decoding. 



illustrated in Fig. Ob). Compared with the C-NC system, the delay is on the encoding side. Specifically, the first 
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encoder has access to both Xj and X2. Both the second and the third encoder have access to all three sources. The 
decoders have access to the same sets of messages sent by the encoders as in the C-C system. 
• JC systems: Of special interest is the joint (noncausal) encoding and decoding system illustrated in Fig. Oc). All 
the sources are collected by a single encoder and encoded jointly. The single decoder reconstructs all the frames 
simultaneously. Note that here the encoding frame delay is (T - 1). 

T-stage sequential coding systems with £-stage frame-delays (see Fig. [6] and [7]) are natural generalizations of the 
3-stage systems discussed so far. The general cases will be discussed in detail in Sections IVIII and IVIIII 

The C-C blocklength-n encoders and decoders are formally defined by the maps 

(Enc.;) jf: X n l X...xX n j >{1 

(Dec.;) gf: [l,...,Mi}X...X{l,...,Mj}^X"j 

for j = 1, . . . , T, where (log 2 Mj)/n is the ;'-th frame coding rate in bits per pixel (bpp) and X s is the 7-th (finite 
cardinality) reproduction alphabet. 

The formal definitions of C-NC encoders are identical to that for the C-C encoders. However, the C-NC decoders 
with a fc-stage frame-delay are formally defined by the maps 

(Dec.;) if: {1, . . ., Mi) X . . . X {1, . . .,M^ KT] } X), 

for ;' = 1, . . . , T. Similarly, the NC-C decoder definitions are identical to those for the C-C decoders and the NC-C 
encoders with a fc-stage frame-delay are formally defined by the maps 

(Enc.;) jf : X'[ x . . . x X" mm[j+kJi —»{!>•••> Mj), 

for ;' = 1, . . ., T. Finally the JC encoder and decoder are defined by the maps 

(Enc.) f in) : X\ X . . . X X n T -» {1, . . . , M], 
(Dec.) g (n) : {\,...,M}^X'\x...xX n T . 

For a frame-delay k, there are boundary effects associated with the decoders (resp. encoders) of the last (k + 1) 
frames for the C-NC (resp. NC-C) systems. For example, the last two decoders in Fig. Oa) are operationally 
equivalent to a single decoder since both use the same set of encoded messages. Although redundant, we retain the 
distinction of the boundary encoders/decoders for clarity and to aid comparison (see Theorem 4 in Section [Vl] and 
Corollary 6.1 in Section [VlITl i. 

D. Operational rate-distortion regions 

For each ;' = 1 , . . . , T, the pixel reproduction quality is measured by a single-letter distortion criterion. We allow 
coupled distortion criteria where the distortion for the current frame can depend on the reproductions in previous 
frames: 

dj-.XjXXi x---x<Y/->R + . 
The distortion criteria are assumed to be bounded, i.e., 

dj,mnx ■= max dj(Xj, x x , . . . , xj) < 00. 
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The frame reproduction quality is in terms of the average pixel distortion 

1 " 

df\xj,x u - ..,*/) = - ^ dj(xj{i), xi(0, ■ ■ ■ , £/(*)). 
1=1 

Of interest are the expected frame distortions E[cf"\Xj,X^)]. It is important to notice that these are frame-specific 
distortions as opposed to an average distortion across all frames. This makes the JC problem different from a standard 
parallel vector source coding problem. Also notice that these fidelity criteria reflect dependencies on previous frame 
reproductions. For example, the second distortion criterion is given by 4 : ^2X^1 x^i as opposed to 

a criterion like dn : X2 X X2 — > R + which is independent of previous reproductions. This model is motivated by 
the temporal perceptual characteristics of the human visual system where the visibility threshold at a given pixel 
location depends on the luminance intensity of the same pixel in the previous frames [9]. 

A rate-distortion-tuple (R, D) = (R\, . . . ,Rj,D\, . . . ,Dj) is said to be admissible for a given delayed sequential 
coding system if, for every e > 0, and all sufficiently large n, there exist block encoders and decoders satisfying 

-logMj<R; + £, (2.1) 

n 

E[df(Xj, X j )] < Dj + e, (2.2) 

simultaneously for all j = 1, . . . , T. For system A e {C-C, JC), the operational rate-distortion region K A is the set 
of all admissible rate-distortion-tuples. For system A e {C-NC, NC-C} with £-stage frame- delay, the operational 
rate-distortion region, denoted by , is the set of all admissible rate-distortion-tuples. We will abbreviate 9if to 
7? A when k = 1. The sum-rate region denoted by !Rj„„,(D) (or 9(£ (D)) is the set of all the admissible sum-rates 
Rj at the distortion tuple D. 

Note that for any given distortion-tuple the minimum rate of the JC system is also the minimum sum-rate of a 
C-NC or NC-C system with frame-delay (T - 1) for the same distortion tuple. For example, in a (T - l)-delayed 
C-NC system, all the decoders become joint decoders and the rate-tuple (Ri — 0, . . .,Rt-\ —Q,Rt = ^yc(D),D) is 
admissible. Hence R^ T N c mm (D) = R JC (D). Therefore C-NC and NC-C systems for T — 2 are less interesting. The 
first non-trivial delayed sequential coding system arises for T — 3 (also see the paragraph after Corollary 5.1). This 
is the reason for commencing the discussion with 3-stage systems. 

III. Results for the 3-stage C-C system 

A. Rate-distortion region 

The C-C rate-distortion region can be formulated as a single-letter mutual information optimization problem 
subject to distortion constraints and natural Markov chains involving auxiliary and reproduction random variables 
and deterministic functions. This characterization is provided by Theorem 1. 

Theorem 1 ( C-C rate-distortion region ) The single-letter rate-distortion region for a T = 3 frame C-C system is 
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given by 

<R C - C = {(R,D)|3 U 2 ,X\g l (-),g 2 (;-),s.t. 

Ri >/(^i;C/i), 

R 2 >I{X 2 \U 2 \U X ), 
R 3 > I(X 3 ;X 3 \U 2 ), 
Dj>E[dj(X h Xj)l j =1,2,3, 
% =g x {U l ), X 2 =g2(U u U 2 ), 

U x -X x -X\, U 2 -(X 2 ,U 1 )-X 3 ) (3.3) 

where {Ui, U 2 ,X\,X 2 ,X 3 ) are auxiliary and reproduction random variables taking values in alphabets {Ux^^X \,X 2 , X 3 ) 
satisfying the cardinality bounds 

Wi\ < |*il + 6, 
1%! < l^il 2 l^2l + 6|^i||^ 2 | +4, 
and {g\(-), g 2 (-, ■)} are deterministic functions. 

The rate-distortion region in [9] is for the 2-stage C-C problem, whereas the above region is for the 3-stage 
C-C problem. The above region differs from what one might expect to get from a natural extension of the 2-stage 
C-C rate-distortion region in [9]. This is because the characterization in Theorem 1 has different rate inequalities 
and fewer Markov chain conditions than what one might expect from the extension. One of the advantages of 
the characterization of the rate-distortion region in d3.3t is that it is more intuitive (as explained below) and this 
intuition carries over with little effort to the case of multiple frames (see Section VII and VIII). Another advantage 
of the characterization of the rate-distortion in ( 13.31 ) is that it is convex and closed as defined. The convexity can 
be shown along the lines of the time-sharing argument in Appendix IC.III which is part of the converse proof of the 
coding theorem for C-NC systems. The closedness can be shown along the lines of the convergence argument in 
Appendix IC.IVI Therefore, unlike the characterization provided in [9], there is no need to take the convex hull and 
closure in ( 13.31 ). 

The proof of achievability can be carried out using standard random coding and random binning arguments and 
will be similar in spirit to the derivation for the T = 2 frame case in [9], but with a different intuitive interpretation. 
Hence we will only present the intuition and informally sketch the steps leading to the proof of Theorem 1 in the 
following paragraph. As remarked in the introduction, a detailed proof of achievability and converse results will 
be presented only for the C-NC system with T = 3 frames (Appendices [TT] and HIT] ). The proofs of achievability 
and converse results for other systems can be carried out in a similar manner but the derivations become lengthy, 
repetitive, and cumbersome, and are therefore omitted. 

The region in Theorem 1 has the following natural interpretation. First, Xi is quantized to Ui using a random 
codebook-1 for encoder-1 without access to X,. Decoder-1 recovers Ui and reproduces Xi as Xi — g"(XJ\). Next, 
the tuple {X 2 ,Ui} is (jointly) quantized to \J 2 without access to X3 using a random codebook-2 for encoder-2. The 
codewords are further randomly distributed into bins and the bin index of U2 is sent to the decoder. Decoder-2 
identifies U2 from the bin with the help of Ui as side-information (available from decoder-1) and reproduces X2 as 
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= <?2(Ui, U2). Finally, encoder-3 (jointly) quantizes {X 3 , U 2 } into X3 using encoder-3's random codebook, bins the 
codewords and sends the bin index of X3 such that decoder-3 can identify X3 with the help of U 2 as side-information 
available from decoders 1 and 2. The constraints on the rates and Markov chains ensure that with high probability 
(for all large enough n) both encoding (quantization) and decoding (recovery) succeed and the recovered words are 
jointly strongly typical with the source words to meet the target distortions. Notice that the conditioning random 
variables that appear in the conditional mutual information expressions at each stage correspond to quantities that 
are known to both the encoding and decoding sides at that stage due to the previous stages. Using this observation, 
one can intuitively write down an achievable rate-distortion region for general delayed sequential coding systems 
by inspection. 

B. Sum-rate region 

The sum-rate region can be obtained from the rate-distortion region 7? c ~ c as shown in the following corollary. 
The main simplification is the absence of the auxiliary random variables U 2 . 

Corollary 1.1 (C-C Sum-rate region) The sum-rate region for the C-C system is ^ m f(D) = (D), 00) where 
the minimum sum-rate is 

<™f(D)= jnin /(X 3 ;* 3 ). (3.4) 

E[dj(XjJCi)]<DjJ=l,2,3, 
X x -Xi-X\, X 2 -(X 2 ,X|)-X, 

Proof: For any point (R, D) 6 ( R CC , there exist auxiliary random variables and functions satisfying all the 
constraints in ( 13.31 ). Since the Markov chains U\ — X\ - Xl and U2 - (X 2 , U\) - X3 hold, and X 2 is a function of 
U 2 , we have 

Ri+Rn+Rj > I(X 1 ;U 1 ) + I(X 2 \U 2 \U 1 ) + I(X 3 \X 3 \U 2 ) 
= I(X 3 ; t/i) + I{X 3 \ U 2 \U X ) + I{X 3 \%\U 2 ) 
= I(X 3 ;U 2 ,X 3 ) 
= I(X 3 ;U 2 ,X 3 ) 
> KX 3 ;! 3 ). 

It can be verified that Markov chains X\ - X t - X\ and Xo - (X 2 ,X-[) - Xj, hold. Therefore the right hand side of 
( 13.4I > is not greater than the minimum sum rate. 

On the other hand, because {U\ — X\, U2 = X2) is a possible choice of {U\, U2}, 

R C ~£{V) = min/(X 3 ; U 2 ,%) < min/^ 3 ;? 3 ), 

where the first minimization is subject to the constraints in ( 13.31 ), and the second minimization is subject to the 
constraints in d3.4| ). Therefore d3.4t holds. ■ 
As will become clear in the sequel, the minimum sum-rate for any type of delayed sequential coding system is 
given by the minimization of the mutual information between the source random variables X T and the reproduction 



random variables X subject to several expected distortion and Markov-chain constraints involving these random 
variables of a form similar to (13.41). 
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C. Sum-rate region for Gaussian source and MSE 

In the case of Gaussian sources and MSE distortion criteria, the minimum sum-rate of any delayed sequential 
coding system (see Corollaries 1.1, 3.1, 5.1 and Theorem 2) can be achieved by reproduction random variables 
which are jointly Gaussian with the source random variables. This is contained in the following lemma. 

Lemma If {X\ , . . . , Xj) are jointly Gaussian, the minimum value of I(X T ; X 7 ) subject to MSE constraints E[(Xj - 
Xj) 2 ] < Dj, j = 1, . . ., T and Markov chain constraints involving X T and X T is achieved by reproduction random 
variables X 7 which are jointly Gaussian with X T . 

Proof: Given any reproduction random vector X = (Xi, . . . ,Xj) satisfying the MSE and Markov chain 
constraints, we can construct a new random vector X = {X\ , . . . , Xj) which is jointly Gaussian with X = {X\ , . . . , Xj) 
with the same second-order statistics. Specifically, cov(X) = cov(X) and cov(X, X) = cov(X, X). Since MSEs are 
fully determined from second-order statistics, X automatically satisfies the same MSE constraints as X. The Markov 
chain constraints for X imply corresponding conditional uncorrelatedness constraints for X, which will also hold for 
X. Since X is jointly Gaussian, conditional uncorrelatedness is equivalent to conditional independence. Therefore 
X will also satisfy the corresponding Markov chain constraints. 

Let the linear MMSE estimate of X based on X be given by AX where A is a matrix. Note that by the orthogonality 
principle and the joint Gaussianity of X and X we have (X - AX) ± X, and further (X - AX) 1L X. Therefore, 

/(X;X) 





h(X) 


-h(X 


- AX|X) 


> 


h(X) 


-h(X 


-AX) 


(b) 
> 


h(X) 


-h(X 


-AX) 




h(X) 


-h(X 


- AX|X) 



= /(X;X). 

Step (b) is because (X-AX) has the same second-order statistics as (X-AX) and it is a jointly Gaussian random 
vector. Step (c) is because (X - AX) is independent of X. 

In conclusion, given an arbitrary reproduction vector, we can construct a Gaussian random vector X satisfying 
the same MSE and Markov chain constraints as X and 7(X;X) > /(X;X). Hence the minimum value of I(X T ;X T ) 
subject to MSE and Markov chain constraints will be achieved by a reproduction random vector which is jointly 
Gaussian with X. ■ 

Since Gaussian vectors are characterized by means and covariance matrices, the minimum sum-rate computation 
reduces to a determinant optimization problem involving Markov chain and second-order moment constraints. 

For Gauss-Markov sources, px i x 2 x l - W(0, Y.x)(m,X2,xt) where the covariance matrix Zx has the following 
structure 

o\ p\o-\o-2 piP2crio- 3 



-X 



P\o-\o-2 cr\ P20-20-3 

P\P20-\0- 3 P20~ 2 <X3 0-3 



which is consistent with the Markov chain relation Xi - X2 - X3 associated with the Gauss-Markov assumption. 
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Define a distortion region D c c := {D | D\ < cr^Di < ct^^Dt, < cr 2 ^ } where 

2 

whose significance will be discussed below. The C-C minimum sum-rate evaluated for any MSE tuple D in this 
region is given by the following corollary. 



Corollary 1.2 (C-C minimum sum-rate for Gauss-Markov sources and MSE) In the distortion region Tf c , the 
C-C minimum sum-rate for Gauss-Markov sources and MSE is 



1 



«™(D)=2 l0 S 



+ Il 0i 



, + - lOE 

D 2 | 2 



(3.6) 



The proof of Corollary 1.2 is given in Appendix U The form of d3 ,6b suggests the following idealized (achievable) 
coding scheme which is explained with reference to Fig. |4] and the upper bound argument in the proof of Corollary 
1.2 in Appendix U Encoder-1 initially quantizes Xj into Xi to meet the target MSE D\ using an ideal Gaussian 
rate-distortion quantizer and decoder-1 recovers Xi. Since the quantizer is ideal, the joint distribution of (Xi,Xi) 
will follow the test-channel distribution of the rate-distortion function for a memoryless Gaussian source [16, p. 345, 
370]. This idealization holds in the limit as the blocklength n tends to infinity. Let Wi := Xi and Wi := Xi. Next, 
encoder-2 makes the causal minimum mean squared error (MMSE) prediction of Xt based on Xi and quantizes 
the prediction error W2 into W2 using an ideal Gaussian rate-distortion quantizer so that decoder-2 can form X2 to 
meet the target MSE D2 with help from Wi. The asymptotic per-component variance of W2 will be consistent with 
( 13. 5t because the rate-distortion quantizer is ideal. Specifically, decoder-2 recovers W2 and creates the reproduction 
X2 as the causal MMSE estimate of X2 based on W 2 . Finally, encoder-3 makes the causal MMSE prediction of X3 
based on W 2 and quantizes the prediction error W3 into W3 using an ideal Gaussian rate-distortion quantizer so 
that decoder-3 can form X3 to meet the target MSE Dj with help from W 3 . Decoder-3 recovers W3 and makes the 
reproduction X3 as the MMSE estimate of X3 based on W 3 . The C-C coding scheme just described is an idealized 
version of DPCM (see [l]-[3], [5], [6] and references therein) because the rate-distortion quantizer is idealized. 
The above arguments lead to the following corollary. 



Rate Ri 




Causal MMSE predictor 
ENCODER 



■E[Xj\W] 



HX ; 



1 Causal MMSE estimator 1 
' DECODER ' 



Fig. 4. Illustrating idealized DPCM. 



Corollary 1.3 (C-C Optimality of idealized DPCM for Gauss-Markov sources and MSE) The C-C minimum sum- 
rate-MSE performance for Gauss-Markov sources is achieved by idealized DPCM for all distortion tuples D in the 
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distortion region T) c c . 

The distortion region Tf~ c is the set of distortion tuples for which the DPCM encoder uses a positive rate for 
each frame. Note that Ui c ~ c has a non-zero volume for nonsingular sources (cr, + Q,pj + +1). Hence, the assertion 
that DPCM is optimal for C-C systems is a nontrivial statement. 



IV. Results for the 3 -stage JC system 

Theorem 2 (JC rate-distortion function, [17, Problem 14, p. 134]) The single-letter rate-distortion function for the 
joint coding system is given by 

fl yc (D)= min /(T 3 ;? 3 ). (4.7) 

E[dj(X h Xi)]<Dj, j=l,2,3 



Compared to R c s ~£ (D) given by ( 13.41 ), the JC rate-distortion function R JC (D) given by ( 14.7b having no Markov 
chain constraints is a lower bound for R^~^ (D). While this follows from a direct comparison of the single-letter 
rate-distortion functions, from the operational structure of C-C, C-NC, NC-C, and JC systems it is clear that the 
JC rate-distortion function is in fact a lower bound for the sum-rates for all delayed sequential coding systems. 

Similar to Corollary 1 .2 which is for a C-C system, Gaussian sources, and MSE distortion criteria, we have the 
following corollary for a JC system. 



Corollary 2.1 (JC rate-MSE function for Gauss-Markov sources) 

(i) For the distortion region T> lc := {D | (Ex -diag(D)) > 0), the JC rate-MSE function for jointly Gaussian sources 
is given by 



R JCGM (\)) = - log 



1 



DlD 2 Dl 



(4.8) 



(ii) For the distortion region D JC , the JC rate-MSE function for Gauss-Markov sources is given by 



R 



JCGM 



(D) 



1 

2 log 

+ b 0i 



<o\(\-p\) 



D 7 



D 3 



(4.9) 



Formula (14.8b is the Shannon lower bound [2], [3] of the JC rate-distortion function. It can be achieved in the 
distortion region D JC by the test channel 

X + Z = X (4.10) 
where Z = (Z1.Z2.Z3) and X = (Xi,X2,Xj,) are independent Gaussian vectors with covariance matrices 

Sz = diag(D), % = £ z - diag(D), 
and X = (Xi,X2,Xt,). The existence of this channel is guaranteed by the definition of D JC . 
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Comparing ( 13. 6t and ( 14. 9\ for D e D JC n £) c ~ c which generally has a nonempty interior, we find that in general 
the C-C sum-rate R G m CGM (T>) is strictly greater than the JC rate R JCGM (D). However, as D — > 0, the two rates are 
asymptotically equal. 

We would like to draw some parallels between C-C sequential coding of correlated sources and Slepian-Wolf 
distributed coding of correlated sources [16]. In the Slepian-Wolf coding problem we have spatially correlated 
sources, temporal asymptotics, and a distributed coding constraint. In the C-C sequential coding problem we have 
temporally correlated sources, spatial asymptotics, and a sequential coding constraint. The roles of time and space 
are approximately exchanged. In Slepian-Wolf coding, the sources Xi, X2, and X3 can be individually encoded 
at the rates H(X\), H(X^K{), and H{Xt\X 2 ) respectively and decoded sequentially by first reconstructing Xi, then 
X2, and finally X3 (see Fig. [5]). The sum-rate is equal to the joint entropy of the three sources which is the rate 
required for jointly coding the three sources. The fact that as D — > the C-C sum-rate approaches the JC sum-rate 
is consistent with the fact that in the Slepain-Wolf coding problem, sequential encoding and decoding does not 
entail a rate-loss with respect to joint coding. As D — > we are approaching near-lossless compression. 



Enc.l 



Dec.l 



Enc.2 



Dec. 2 



Enc.3 



H(X,|X 2 



Dec. 3 



X, 



Fig. 5. Slepian-Wolf coding with a sequential decoding 



V. Results for the 3 -stage C-NC system 

Similar to Theorem 1 and Corollary 1 . 1 for the C-C system, the rate-distortion and sum-rate regions for a C-NC 
system are characterized by Theorem 3 and Corollary 3.1 respectively as follows. 

Theorem 3 (C-NC rate-distortion region) The single-letter rate-distortion region for a C-NC system with one-stage 
decoding frame-delay is given by 

tffC-NC = {(R 5 D) 1 3 t/ 2 ,X\gi(-, s.f. 

Ri >/(Xi;t/i), 

R 2 >/(X 2 ;C/ 2 |C/i), 
R 3 > I(X 3 ;X*\U 2 ), 
Dj>E[dj(Xj,X% 7=1,2,3, 
% =gi(U u U 2 ), 

Ui -X, -Xl U 1 -(X 2 ,U l )-X i ) (5.11) 
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where gi(-, •) is a deterministic function and [Ui, U2) are auxiliary random variables satisfying cardinality bounds 

YUx\ < |#il + 6, 

m < i*ii 2 i* 2 i + 6i^iii^ 2 i+ 5. 

Note that fl c ~ c c <R C - NC because the encoders and decoders of a C-C system can also be used in a C-NC 
system. As in Theorem 1, the characterization of the rate-distortion region given in Theorem 3 is both convex and 
closed and there is no need to take the convex hull and closure. 

The proof of the forward part of Theorem 3 is given in Appendix [Hi The region in Theorem 3 has the following 
natural interpretation. First, Xi is quantized to Ui using a random codebook-1 for encoder-1 without access to 
X2. Next, the tuple {X 2 ,Ui} is (jointly) quantized to U2 without access to X3 using a random codebook-2 for 
encoder-2. The codewords are further randomly distributed into bins and the bin index of U2 is sent to the decoder. 
Decoder- 1 recovers Ui from the message sent by encoder-1. Then it identifies U2 from the bin with the help of Ui 
as side-information and reproduces Xi as Xi = g"(Ui,U2). Finally, encoder-3 (jointly) quantizes {X 3 ,U 2 } into X 3 , 
using encoder-3's random codebook, bins the codewords and sends the bin index such that decoder-2 and decoder-3 
can identify X 3 , with the help of U 2 as side-information available from decoders 1 and 2. The constraints on the 
rates and the Markov chains ensure that with high probability (for all large enough n) both encoding (quantization) 
and decoding (recovery) succeed and the recovered words are jointly strongly typical with the source words to meet 
the target distortions. 

The (weak) converse part of Theorem 3 is proved in Appendix [Til] using standard information inequalities by 
defining auxiliary random variables Uj(i) = (S j,Xj(i-)), j = 1,2, where Sj denotes the message sent by the 7-th 
encoder satisfying all the Markov-chain and distortion constraints, and a convexification (time-sharing) argument as 
in [16, p. 397]. The cardinality bounds of the auxiliary random variables are also derived in Appendix IC.IIII using 
the Caratheodory theorem. 

Corollary 3.1 (C-NC sum-rate region) The sum-rate region for the one-stage delayed C-NC system is < R^~^ C (D) = 
[i?^* c (D), 00) where the minimum sum-rate is 

^™ C (D)= /(X 3 ;?). (5.12) 

E[dj(_XjJcJ)]<Dj,j=l,2,3, 

x,-x 2 -x 3 



Proof: The proof is similar to that of Corollary 1.1. The main simplification is the absence of the auxiliary 
random variables U 2 . For any point (R, D) e ( R C ~ NC , there exist auxiliary random variables and functions satisfying 
all the constraints in <R C ~ NC . Since the Markov chains U\—X\— X% and U2-(X 2 , U\)-Xj hold, and X\ is a function 
of U 2 , we have 

R 1 +R 2 +R 3 > I(X l -U l ) + I(X 2 -U 2 \U l ) + I{X i -X i 2 \U 2 ) 
= I(X 3 \ Ui) + I(X 3 ; U 2 \Ui) + I(X 3 ;Xl\U 2 ) 
= I(X 3 ;U 2 ,Xl) 
= /(X 3 ;t/ 2 ,F) 
> IiX^-X 3 ). 
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It can be verified that the Markov chain X\ - X 2 - X3 holds. Therefore the right hand side of ( 15.12b is not greater 
than the minimum sum rate. 

On the other hand, because {U\ — 0, U2 = X\] is a possible choice of {U\, U2}, 

R%£ C (D) = min/(X 3 ; U 2 ,X~l) < mmI(X 3 ;X 3 ), 

where the first minimization is subject to the constraints in ( 15.1 lb . and the second minimization is subject to the 
constraints in (15.121 1. Therefore ( 15.121 ) holds. ■ 
As noted earlier, the JC rate-distortion function ( 14.7b having no Markov chain constraints is a lower bound for 
/?^ m f c (D). Remarkably, for Gauss-Markov sources and certain nontrivial MSE tuples D discussed below, R^~^ C (D) 
coincides with the JC rate R JC (D). 

Corollary 3.2 (JC-optimality of a one-stage delayed C-NC system for Gauss-Markov sources and MSE) For all 
distortion tuples D belonging to the distortion region T> lc defined in Section [IV] Corollary 2.1(i), we have 

R^ CGM (P) = R JCGM (p). 

Proof: The JC rate-distortion function is achieved by the test channel ( 14.1 Oi l in the distortion region D JC . We 
will verify that the Markov chain X\ - X 2 - X3 holds for this test channel. 

Note that because all the variables are jointly Gaussian, they have the property that A 11 B and A 11 C implies 
A 11 {B, C) for any Gaussian vector (A, B, C). 

By the Markov chain X\ -X2-X3, the MMSE estimate of X3 based on X\ and X 2 is 

X 3 =p2—X 2 + N (5.13) 

where N is Gaussian and independent of {X\,X2}. 

By the structure of the test channel, Z\ 11 \Li,7%,X%,X^ implies Z\ 11 \X2,X^\, which further implies Z\ 11 N. 
Moreover, because N 11 {X\,Z\}, we have N 11 X\. Therefore N 11 {X\,X2,X{\. So the best estimate of X3 based on 
[Xi,Xz,Xi) is still formula (15.13b . It follows that the Markov chain X3 -X2 - (X\,X\) holds which in turn implies 
that X\ - X 2 - Xi holds and completes the proof. ■ 

Recall that the JC rate-distortion function is a lower bound for the minimum sum-rate for all delayed sequential 
coding systems. Corollary 3.2 implies that the JC rate-distortion performance is achievable in terms of sum-rate with 
only a single frame decoding delay for Gauss-Markov sources and MSE tuples in the region D JC . The first-order 
Markov assumption on sources X\—Xz — ^3 is essential for this optimality. An interpretation is that Xt supplies all 
the help from X3 to generate the optimum More generally (for T > 3), as shown in Section VII, C-NC encoders 
need access to only the present and past frames together with one future frame to match the rate-distortion function 
of the JC system in which all future frames are simultaneously available for encoding. Thus, the neighboring future 
frame supplies all the help from the entire future through the Markovian property of sources. The benefit of one 
frame-delay is so significant that it is equivalent to arbitrary frame-delay for Gauss-Markov sources and MSE 
criteria when D e T> 3C . 

It is of interest to compare Corollary 3.2 with the real-time source coding problem in [18]. In [18] it is shown that 
for Markov sources, a C-C encoder may ignore the previous sources and only use the current source and decoder's 
memory without loss of performance. This is a purely structural result (no spatial asymptotics and computable 
single-letter information-theoretic characterizations) exclusively focused on C-C systems. In contrast, Corollary 3.2 
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is about achieving the JC-system performance with a C-NC system. Additionally, [18] deals with a frame-averaged 
expected distortion criterion as opposed to frame-specific individual distortion constraints treated here. 

The JC-optimality of the one-stage delayed C-NC system is guaranteed to hold within the distortion region D JC 
defined as the set of all distortion tuples D satisfying the positive semidefiniteness condition (L x ~ diag(D)) > 0. 
For nonsingular sources Ex > => i m j n (Ex) > where /l m j n (2x) is the smallest eigenvalue of the positive definite 
symmetric (covariance) matrix Ey. For any point D in the closed hypercube [0, /f m i n ] r , 

E x - diag(D) = (Ex - Amn.1) + diag(^ mm e - D) 

where / is the identity matrix and e = (1, . . . , 1) is the all-one vector. Because both terms are positive semidefinite 
matrices, the sum is also positive semidefinite. Therefore T> 1€ contains this hypercube, which has a strictly positive 
volume in R 7 " => D JC has a non-zero volume. Hence, the JC-optimality of a C-NC system with one-stage decoding 
delay discussed here is a nontrivial assertion. D JC includes all distortion tuples with components below certain 
thresholds corresponding to "sufficiently good" reproduction qualities. However, it should be noted that this is not 
a high-rate (vanishing distortion) asymptotic. 

On the contrary, the JC-optimality of a C-NC system with one-stage decoding frame-delay does not hold for all 
distortion tuples as the following counter example shows. 

Counter example: Consider Gauss-Markov sources X 3 where X\ - X2 and MSE tuple D where D\ - Do — D. 
The JC problem reduces to a two-stage JC problem where the encoder jointly quantizes (Xi.Xs) into (Xi.Xs) and 
the decoder simply sets X2 = Xi. However, the C-NC problem reduces to a two-stage C-C problem with sources 
(Xi, X3) because the first two C-NC encoders are operationally equivalent to the first C-C encoder observing Xi and 
the last C-NC encoder is operationally equivalent to the second C-C encoder observing all sources. As mentioned 
in the last but one paragraph of Section IIVI generally speaking, a two-stage C-C system does not match (in sum- 
rate) the JC-system rate-distortion performance. Therefore the three-stage C-NC system also does not match the JC 
performance for these specific sources and certain distortion tuples D. Note that these sources are actually singular 
(Ex has a zero eigenvalue) and D JC only contains trivial points (either D — or D3 = 0). So for the nontrivial 
distortion tuples D described above (which do not belong to D JC ), the JC-optimality of a C-NC system with a 
one-stage decoding delay fails to hold. 

To construct a counter example with nonsingular sources, one can slightly perturb Ex such that it becomes positive 
definite. However, the JC rate and C-NC sum-rate only change by limited amounts due to continuity properties 
of the sum-rate-distortion function with respect to the source distributions (similar to [17, Lemma 2.2, p. 124]). 
Therefore we can find a small enough perturbation such that the rates do not match. 

The JC-optimality of the one-stage delayed C-NC system is not a unique property of Gaussian sources and 
MSE. It also holds for symmetrically correlated binary sources with a Hamming distortion. These sources can 
be described as follows. Let X\,Ni,Ni be mutually independent Ber(l/2), Ber(pi), Ber(p>2) random variables 
respectively. X2 — X\ © N\, X3 = X2 © N2, where © indicates the Boolean exclusive OR operation. One can 
verify that the sum-rate-distortion performance of a C-NC system matches the JC rate-distortion performance for 
these sources and Hamming distortion within a certain distortion region of a nonzero volume. We omit the proof 
because it is cumbersome. 

VI. Results for the 3-stage NC-C system 

We can derive the rate-distortion region for an NC-C system by mimicking the derivations for the C-NC system 
discussed till this point. However, due to the operational structural relationship between C-NC and NC-C systems, 
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it is not necessary to re-derive the results for the NC-C system at certain operating points, in particular, for the 
sum-rate region: 

Theorem 4 ("Equivalence" of C-NC and NC-C rate-distortion regions) 

(i) The rate-distortion region for the one-stage delayed NC-C system is given by 

k nc-c = {(R,D) I 3 U 2 ,X 3 ,gi(-),g 2 (-,-),s.t. 
Ri > KX 2 ; Ui), 
R2>KX i ;U 2 \U x ), 
R 3 > I(X 3 ;%\U 2 ), 
Dj>E[dj(Xj,X% j =1,2,3, 

Xi =gl(Ul),X2 = g2(UuU 2 ), 

Ui -X 2 -X i ). 

with the following cardinality bounds 

VUi\ < \Xi\ + 6, 

\U 2 \ < |^i| 2 |^ 2 | 2 |^3l +6|^i||^ 2 ||^2l +4. 

(ii) For an arbitrary distortion tuple D, the rate regions <R NC - C and <R C - NC are related in the following manner: 

(/?i,/? 2 ,«3,D)e^ A ' c => (Ri +R 2 ,R 3 ,0,D)e'R NC - c , 
(/?i,/? 2 ,«3,D)e^ A ' c " c => (0,R u R 2 +R3,D)e'R c ' NC . 

(iii) For an arbitrary distortion tuple D, the minimum sum-rates of one-stage delayed C-NC and NC-C systems 
are equal: 

R^ C (D) = /Cr c (D). 

The proof of part (i) is similar to that of Theorem 3. Part (ii) can be proved by either using the definitions of 
<ftC~NC an( j <ftNC-c or more directly from the system structure (see Figs [3ja) and (b)) as follows. Given any C-NC 
system with rate tuple (Ri,R 2 ,Ri), we can construct an NC-C system as follows: (1) combine the first two C-NC 
encoders to get the first NC-C encoder, (2) use the third C-NC encoder as the second NC-C encoder, and (3) use a 
null encoder with constant zero output as the third NC-C encoder. Then we have an NC-C system with rate tuple 
(Ri +R 2 ,Ri,,0) and the same distortion tuple. Similarly, given any NC-C system, we can use a null encoder as the 
first C-NC encoder and combine the last two NC-C encoders to get a C-NC system. Part (iii) follows from part 
(ii). 

The (sum-rate) JC-optimality property of a C-NC system with one-stage decoding frame-delay given by Corol- 
lary 3.2 automatically holds for an NC-C system with one-stage encoding frame-delay. This relationship allows 
one to focus on the performance of only C-NC systems instead of both C-NC and NC-C systems without loss 
of generality. This structural principle holds for the general multi-frame problem with multi-stage frame-delay, as 
discussed in Section IVIIII 
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VII. General C-NC results 

The T-stage C-NC system with a £-stage decoding delay is a natural generalization of the 3-stage C-NC system 
with one-stage decoding delay. For j = 1 , . . . , T, encoder- j observes the current and all the past sources X 7 and 
encodes them at rate Rj, Decoder- j observes all the messages sent by encoders one through (min{y + k, T}) and 
reconstructs Xj. As an example, we present the diagram of a C-NC system with T — 4 frames and k = 2-stage 
decoding delay in Fig. [6] Similar to Theorem 3 and Corollary 3.1, we have the following results. 



X, 



x 4 



Ri 

WEnc.2| j— 



End 



(•I Enc.3 



Enc.4 
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Dec.2 



ill Dec.3 



Dec.4 



■X, 



■x. 



■x 4 



Fig. 6. A 4-stage C-NC system with a 2-stage decoding delay. 



Theorem 5 (General C-NC rate region) The rate region for the T-stage C-NC system with a fc-stage decoding 
delay is given by: 

<Rf- NC = {(R,D) | 3 U T ~ 1 ,X r ,g r ~ k ~ l (-),s.t. 

Rj>I(V;Uj\U j -\ j= l,...,(r-l) 

Rt > I(X T \X^_ k \U T - x ), 

Uj-{xKu^)-x] +v ; = i,...,(r-i), 

Xj = gj(W +k l j=l,...,(T-k-l), 
Dj>E[dj(Xj,X j )], j=l,...,T}. 



with the following cardinality bounds 

j 

for j = 1,. . .,T - 1, JfUj\ < Y\ \X k \ Y\ m\ + 2T. 

k=l k=\ 

The cardinality bounds for the alphabets of the auxiliary random variables stated in Theorem 5 are obtained by 
a loose counting of constraints (see Appendix IC.IIII ). These bounds can be improved by some constants by a more 
careful counting of constraints. The first term ni=i \Xk\ Ili=i l%l comes from the Markov chain constraints. The 
second term 2T comes from T rate constraints and T distortion constraints. 
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Corollary 5.1 (General C-NC sum-rate region) The sum-rate region of the general C-NC system is given by 
(D) = [R?~ NC (D), oo), where /?f _JVC (D) is the minimum value of I(X T ;X T ) subject to distortion constraints 
£'[rf ; -(Xj,X-')] < Dj,j = 1, . . .,T and Markov chain constraints 

Xj-{X j+k ,X j - x )-X] +k+x , j=l,...,T-k-l. (7.14) 

For general C-NC systems with increasing system frame-delays, the expressions of the minimum sum-rates 
contain the same objective function I(X T ;X T ) and distortion constraints E[dj(Xj,X-*)] < Dj,j = l,...,T, but with 
a decreasing number of Markov chain constraints. In the limit of maximum possible system frame-delay, equal to 
(T—l), which is the same as in a JC system, we get the JC rate-distortion function with purely distortion (no Markov 
chain) constraints. When T — 2, a one-stage delayed C-NC system is trivial in terms of the sum-rate-distortion 
function because it reduces to that of a 2-stage JC system. Note that this reduction holds for arbitrary source 
distributions and arbitrary distortion criteria. So nontrivial C-NC systems must have at least T = 3 frames. This 
is the motivation for choosing T — 3 to start the discussion of delayed sequential coding systems in Section III-CI 
However, this type of reduction should be distinguished from the nontrivial reduction result of Corollary 3.2 which 
only holds for certain source distributions and distortion criteria. 

Using the notation of directed information [19], [20] 

N 

I(A N -> B N ) := ^ /(A"; B„\B"~ l ), 

n=\ 

and its generalization to A:-directed information [21] 

N 

h(A N — > B N ) := I(A N ; B N ) - ^ I(B n ~ k ; A„|A n_1 ) 

n=k+l 

= I(A N ;B N )-I(0 k B N - k ^A N ), 

where k B N ~ k is the AMength sequence (0, . . . , 0, B\, . . . , B^-k), we can write the objective function of the mini- 
mization problem in Corollary 5.1 as follows 

7(X r ;X T ) = I k+ i(X T -^X T ) + /(0* +1 X r ~*~ 1 -> X T ). (7.15) 

The Markov chain constraints (I7.14l i are equivalent to the condition 7(0* +1 X — > X T ) = 0. So the sum-rate 
can be reformulated as the minimum of the first term of ( 17.151 ) subject to the second term = and the distortion 
constraints. 

As the generalization of Corollary 3.2, we have the following result for £-th order Gauss-Markov sources where 
X\ , . . . , X T form a £-th order Markov chain. 

Corollary 5.2 {JC optimality of ' k- stage delayed C-NC systems for k-th order Gauss-Markov sources and MSE) 

R C k ^ GM (D)=R JCCM (D) 

for the distortion region G JC . 

Proof: The proof is similar to that of Corollary 3.2. The JC rate-distortion function is achieved by the test 
channel ( 14.1 Oi l in the distortion region D JC . We will verify that the Markov chain Xj - (X^ k ,X^ x ) - holds 
fbrj=l,...,(r-*-l). 

By the k-th order Markov property of the sources, we have X' - Xj* k - Xj+^+i. The MMSE estimate of Xj+^+i 
based on X^ k is given by 

k 

Xj+k+l = Y j a m Xj +m +N (7.16) 

m= 1 
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where N is a Gaussian random variable which is independent of X^ +k , and {a,„} are the coefficients of the MMSE 
estimate. By arguments which are similar to those used to show the independence of random variables in the proof 
of Corollary 3.2, it can be shown that N is independent of X J . Therefore the best estimate of Xj +k+ \ based on 
{X j+k ,X j } is still formula ( 17.161 ). It follows that the Markov chain (X',X') - X^ k - X j+k+ i holds which in turn 
implies that Xj - (X J+k ,X^ 1 ) - Xj +k+l holds and completes the proof. ■ 
This corollary shows that for the k-th order Gauss-Markov sources, the JC sum-rate-MSE performance is achieved 
by the £-stage delayed C-NC system. Let T)d denote the distortion region for which the <i-stage delayed C-NC 
sum-rate matches the JC rate for A:-th order Gauss-Markov sources and MSE. This region keeps expanding with 
delay, 

D k QD k+1 C...C£> r -i ={R + } r . 
The last equality is because the JC system itself has (T - l)-stage delay. 

VIII. General NC-NC results 

We can consider the general NC-NC systems with k\ -stage delay on the encoder side and A:2-stage delay on the 
decoder side. C-NC and NC-C systems are special cases when k\ — and k^ = 0, respectively. As an example, 
in Fig. [7] we present the diagram of an NC-NC system with one-stage encoding delay and one-stage decoding 
delay (T = 4, k\ = k-i — 1). Although NC-NC systems appear to be structurally more complex, we can relate the 
rate-distortion region of NC-NC systems to that of the C-NC systems using structural arguments as in Section [VT] 
Denoting the rate region of the NC-NC systems described above by ^ j , we have the following result which 
is similar to parts (ii) and (iii) of Theorem 4. 



X,- 



X 2 



(it Enc.2 



End 



(I! Enc.3 



^ Enc.4 



-h 



-h 



-h 



R 4 
-f- 



;:5 Dec.2 



f* Dec.1 



ill Dec.3 



^ Dec.4 



■X, 



X 2 



■Xi 



■x 4 



Fig. 7. A 4-stage NC-NC system with 1 -stage encoding delay and 1 -stage decoding delay. It has the same sum-rate-distortion 
performance as the system in Fig. [3] 



Theorem 6 (Relationship between general NC-NC and C-NC rate regions) For any distortion tuple D, 
(i) (*!,... ,R T , D) e K»% NC =» (0, . . . , 0,R U . . . , R T - kl -u Zj =r ^- r p d ) e 



(ii) R T ,U) e 1%-™ =* (E -ii Rj, Rk 1+ 2, . . . ,Rt, 0, . . . , 0, D) e ^ c i ; iVC . 

where both sequences of zeros contains k\ zeros. 
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This result can be proved by noting that the first NC-NC encoder can be replaced by the combination of the first 
k\ C-NC encoders, and the last C-NC encoder can be replaced by the combination of the last k\ NC-NC encoders, 
without affecting the reproduction of frames. As a consequence of this theorem, we have an exact equivalence 
between the sum-rates of the NC-NC and the C-NC systems. 

Corollary 6.1 ( Sum-rate equivalence between NC-NC and C-NC) The minimum sum-rates of the {k\ , £2)-stage 
delayed NC-NC systems and the {k\ + A/>)-stage delayed C-NC systems are equal: 

*Sf C (D) = (D). 

In conclusion, for any two delayed sequential coding systems, when the sums of the encoding frame-delay and 
decoding frame-delay are equal, they have the same sum-rate-distortion performance. For example, the NC-NC 
system in Fig. [7] has the same minimum sum-rate as the 2-stage delayed C-NC system in Fig. [6] Therefore we can 
always take the C-NC system as a representative of all the delayed sequential coding systems. 

IX. Concluding remarks 

In this paper, motivated by video coding applications, we studied the problem of sequential coding of correlated 
sources with encoding and/or decoding frame-delays and characterized the fundamental tradeoffs between individual 
frame rates, individual frame distortions, and encoding/decoding frame-delays in terms of single-letter information- 
theoretic quantities. Our characterization of the rate-distortion region was for multiple sources, general inter-frame 
source correlations, and general frame-specific and coupled single-letter fidelity criteria. The main message of 
this study is that even a single frame-delay holds potential for yielding significant performance improvements in 
sequential coding problems, sometimes even matching the joint coding performance. 
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Appendix I 
Corollary 1 .2 Proof 

We first show that the right hand side of ( 13.6b is an upper bound of R^ W n GM DV defining the auxiliary random 
variables satisfying all the constraints in ( 13.4b and evaluating the objective function of ( 13.4b . Then we show that 
the right hand side of ( 13.6b is also an lower bound of R C SW ^ GM using information inequalities. 

Upper bound: Due to the Markov chains in ( 13.41 ). 

/(X 3 ;* 3 ) = 7(Xi;Xi) + I(X 2 ;X 2 \%) + IiX^X^X 2 ), (1.1) 

where on the right hand side, each term corresponds to a stage of coding. We will sequentially define X\,X2,X^ 
and evaluate the expression stage by stage to highlight the structure of the optimal (achievable) coding scheme. 

At first, since D\ < cr\, we can find a random variable X\ such that: (1) X\ +Z\ = X\, (2) X\ and Z\ are independent 
Gaussian variables with variances {cr\ —D\) and D\ respectively, and (3) the Markov chain X\ —X\ -X^ holds. The 
MSE constraint E(X\ - X\) 2 < D\ is satisfied because E(Z 2 ) = D\. Note that the distribution of (X\,X\) achieves 
the rate-distortion function for the Gaussian source X\ and we have 

A \ 

(1.2) 



I(X l ;X l )= ilog 



D, 
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Since (X\,X2) are jointly Gaussian, we have 

X 2 



0~2 0~2 — 

pi — X, +Ni=pi — Xi + W 2 , 



where N\ is a Gaussian variable with variance (1 -p 2 )cr 2 which is independent of (Xi.Zj). W 2 := (pi^-Zj +N\\ is 
the innovation from Xi to X 2 , whose variance a\ is given in ( 13.51 . When D 2 < cr\ , we can find a random variable 
W2 such that: (1) W2 + Z2 = W2, (2) W2 and Z2 are independent Gaussian variables with variances (crL — D2) 
and L>2 respectively, and (3) the Markov chain W 2 - (X 2 ,Xi) - (Xi,X 3 ) holds. Define X 2 := (p\ ^rX\ + W 2 ), which 
implies X2 = (X2 + Z2). The MSE constraint E(X2 - X2) 2 < D 2 is satisfied because E(Z%) = £>2- The Markov chain 
constraint X2 - (X 2 ,Xi) - X3 is also satisfied. Note that the distribution of (W 2 , W 2 ) achieves the rate-distortion 
function for the Gaussian source W2 and we have 



7(X 2 ;X2|X 1 ) = /(X 2 ;X2|X 1 )= I log 



cr 



Do 



(1.3) 



where the first step is because X2 - (X2,Xi) - Xi forms a Markov chain. 
Similarly, we can define X3 such that when D 3 < cr 2 ^ , 



i^x 1 ) = i log 



'a 1 ^ 

u w 3 



D 3 



Finally, combining dI.U - dI.4t and ( 13.4b , when D e Tf c , we have 



R 



C-CCM 



(D)< ^log 



D 3 



+ - log 

2 g 



2 



+ - log 

2 S 



D 3 



(1.4) 



Lower bound: For any choice of X 3 satisfying the constraints in d3.4t , we have 



R 



C-CCM 



(D) 



= mint/CXjjXO + 7(X 2 ; X 2 |Xl) + 7(X 3 ; X3IX 2 )] 
> min[/(X i; XO + 7(X 2 ; X 2 |X0 + 7(X 3 ; X3IX 2 )] 
= mxn[h{X x ) - h(Xi |X, ) + fc(X 2 |Xi) - KX^X 2 ) 
+h(X 3 \X 2 ) - KX^X 3 )] 



h(Xi) + min[/i(X 2 |Xi) - h{X x \Xi) 



+h(X 3 \X 2 ) - h{X 2 \X 2 ) - h(X 3 



l - log(27recr 2 ) - - log(27reD 3 ) + xmn[h{X 2 \X x ) 
-/i(X,|X,)] + min[/ 1 (X 3 |X 2 ) - h{X 2 \X 1 )}, 



(1-5) 



where all the minimizations above are subject to all the constraints in d3.4t . By Lemma 5 in [9], since the Markov 



chain X x -X l -X 2 holds and h(Xi\Xi) < ±log(27re£>i) we have 



minUiiX^Xi) - hiX^Xi)] > - log 



(1.6) 



Similarly, since the Markov chain X 2 -X 2 -X 3 holds and h(X 2 \X 2 ) < h(X 2 \X 2 ) < \ \og(2neD 2 ), replacing (X 1 ,X 2 ,X 1 ) 
by (X 2 ,X 3 ,X 2 ) respectively in 



we have 

mintMXalX 2 ) - h(X 2 \X 2 )] > - log 



/~2 \ 

W3 



D 2 



(1.7) 
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Using (II. 6b and (II. 7t in (II. 5b . we have 



C-CGM 



(D) 



- log 

2 g 



1 



log 



£>3 



1 , 

+ - log — - 

2 g D\ 



4 l0g 



2 \ 



Do 



— log 

2 e 



+ ilog 



0~ 



D 2 , 

2 \ 



In conclusion, the right hand side of the above formula is both an upper bound and a lower bound of R c CCM (D), 
and is thus equal to R C ~ CGM (D). ■ 

Appendix II 
Theorem 3 forward proof 

For any tuple (R, D) belonging to the right hand side of ( 15.11b . there exist random variables U 2 ,X 3 and a function 
gi such that all the constraints in ( 15.111 ) are satisfied. We will describe the encoders and decoders with parameters 
(M 3 , R' 2 , R' v €\) in Subsections I to III. In Subsection IV and V, we choose the values of the parameters and analyze 
the rates and distortions to show that (12.1b and ( 12.2b hold for every e > and sufficiently large n. 

I Generation of codebooks 

1) Randomly generate a codebookCi consisting of M\ sequences (codewords) of length n drawn iid ~ YYL\ PuS u \(i))- 
Index the codewords by s\ e {1, 2, . . . , M{\. Denote the s\-\h codeword by Ui(si). 

2) Randomly generate a codebook Ci, independently of C\, consisting of 2"*2 sequences (codewords) of length 
« drawn iid ~ Yl"=\ Pu^iuiii))- Index the codewords by s' 2 e {1, 2, . . . , 2" s 2). Denote the s' 2 -th codeword 
by \J2{s' 2 )- Then randomly assign the indices of the codewords to one of M 2 bins according to a uniform 
distribution on {1,2, . . . , M 2 ), where M 2 < 2 nRj -. Let t>2{s2) denote the set of indices assigned to the S2-th bin. 

3) Randomly generate a codebook C3, independently of (Ci,C 2 ), consisting of 2" R i sequences (codewords) 
of length n drawn iid ~ YYt=iPx 2 x (^2(0>^3(0)- Note that each component of each codeword is a tuple 
(x 2 (0,^3(0) G X 2 X<Y 3 . Index the codewords by s' 3 € {1, 2, . . . , 2" R i}. Denote the 53-th codeword by X 3 (i 3 ). 
Then randomly assign the indices to one of M3 bins according to a uniform distribution on {1,2, . . . , M3}, 
where M3 < 2 nR \ Let ^3(53) denote the set of indices assigned to the 53-th bin. 

Reveal all the codebooks to all the encoders and the decoders. 



// Encoding 

1) Given a source sequence Xj, encoder-1 looks for a codeword Ui(si) in C\ such that (Xi,Ui(si)) e A e \ (pxiUi) 

*(n) 

where e\ > and A £] (px, u, ) is tne e 1 -strong typical set of length « with respect to the joint distribution px,,u l 
[16]. For simplicity, we will not indicate either the distribution or the length of sequence in the definition of 
a strong typical set if there is no ambiguity. If no such codeword can be found, set s\ — 1. If more than one 
such codeword exists, pick the one with the smallest index s\. Encoder-1 sends s\ as the message. 

2) Given sequences {Xi, X2, Ui(si)}, encoder-2 looks for a codeword U2{s' 2 ) in C 2 such that (Xi, X2, Ui(ii), U2(i 2 )) e 
A* . If no such codeword exists, set s' 2 — 1. If more than one such codeword exists, pick the one with the 
smallest s' 2 . Encoder-2 sends the bin index s 2 such that s' 2 e S 2 (s2)- 

3) Given sequences {Xi, X 2 , X3, Ui(ii), U2(i 2 )}, encoder-3 looks for a codeword X?,^) in codebook C2 such 
that (X 1 ,XJi(si),XJ2(s' 2 ),X 2 ] (s' i )) e A* If no such codeword exists, set s'^ = 1. If more than one such codeword 
exists, pick the one with the smallest s',. Encoder-3 sends the bin index 53 such that si € ^3(53). 
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/// Decoding 

1) Given the received indices s 2 , decoder-1 looks for a sequence U2(s' 2 ) sucn that $2 6 ^2(^2) and (Ui(ii), U2(S 2 )) e 
A* . If more than one such sequence exists, pick the one with the smallest s' 2 . Generate the reproduction 
sequence Xi by 

Xi(z') = g\(U\(s\,i), UiihJ)), i= 1, ■ ■ .,«, 

where t/i(.si, 0> an d ^2(^2, are the i-th components of the sequences Xi,Ui(si), and l^Cfo) respec- 

tively. 

2) Given the received indices s 3 and previously decoded index s' , decoder-2 looks for a sequence X^(^) such 
that s' 3 e 2^3(^3) and (Ui(ji),U2(^),X|(^)) 6 A*. If more than one such sequence exists, pick the one with 
the smallest s' v Separate X^Sg) (note that each component is a tuple) to get the reproduction sequences %2(S' 3 ) 
and X3CS3). This decoder is conceptually the combination of decoder-2 and 3 in Fig. 0a). 

IV Analysis of probabilities of error events 

Let us consider the following "error events" £1 through En. If none of them happens, the decoders successfully 
reproduce what the encoders intend to send, and the expected distortions are closed to E[dj(Xj,X-i)], which is not 
greater than Dj. Otherwise, if any event happens, the decoders may make mistakes on reproduction, and we will 
bound the distortions by the worst case distortion ^j, max - 

• 61: (Frame-1 not typical) Xj ( A* . 

Pr{&\) — > as n — > 00 by the strong law of large numbers. 

• &2'- (Encoder- 1 fails tofinda codeword) Given any (deterministic) sequence X] e A* ei ,$s\ such that (xi, Ui(si)) e A* 
By [16, Lemma 13.6.2, p. 359], for any typical sequence Xi and each codeword Ui(*i) which is randomly generated 

iid according to pu t , we have 

2 -nQ(X 1 ;Ui)^) < /> r ((Xi, Ui(,$i)) € A* ) < 2-" (/( *' ;£/|) - e2) , 

where €2 depends on e\ and n, and €2 — > as e\ — > and n — > 00. Therefore we have 

Pr(& 2 ) = (l-J>r<(*i,Ui(l))eA^)) Ml 
< exp(-M 1 2-" (/(Xi;f/l)+e2) ), 

where the inequality is because (1 - xf < exp(-«x). Let 
Since Ri > I(X\; U\), we have 

Pr(8 2 ) < exp(-2" (Sl+ei " /(Xi;f/l)) ) < exp(-2" e ') 

which goes to zero as 11 ^ 00. 

• £3: (Message-1 not jointly typical with frame-2) Given any sequences (xi,Ui(.Si)) 6 A* , (Xi, X2, Ui(si)) £ A* r 
Using the Markov lemma [16, Lemma 14.8.1, p.436], since the Markov chain U\ —X\ — X2 holds and X2 is 

drawn iid ~ px^x { , ^ > K( x i>X2,Ui(ii)) £ A* ) < e\ for n sufficiently large. Therefore Pr(£>{) — > as ej — > 0, and 
n — > 00. 

• £4: ( Encode r-2 fails to find a codeword) Given any sequences (Xi,X2, Ui(.si)) e A* , $ s' 2 such that (xi,X2,Ui(«i), 
U 2 (s 2 )) e A* er 
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By arguments which are similar to those used in the analysis of £2, we have 

Pr(£4)<e X p(-2"^- / ^ 2f/ ' ; ^ ) ^)), 
where 63 — > as e\ — > and « — > 00. Let 

# 2 :=/(^ 2 f/i;I/ 2 ) + ei + e 3 - 

We have 

Pr(S 4 ) < exp(-2" £l ), 

which goes to zero as n — > 00. 

• £5: (Encoder-2's bin size too large) Given that si e S 2 (s 2 ), the cardinality of the S2-th bin satisfies 

2k(r;+ei) 

\S 2 (s 2 )\> — — + 1. 

M 2 

Because s 2 G S 2 (s 2 ) and the other (2" R 2 - 1^ codewords are randomly assigned, (\S 2 (s 2 )\ -1) follows the binomial 
distribution with parameters {2" R ? - 1, 1/Mi). We will use the following Chernoff bound [22, Thm 4.4(3), p. 64]: 
For a binomial random variable X with parameters (n,p), if a > 6np, then Pr(X > a) < 2". When ne\ > 3 which 
guarantees 2" e ' > 6, taking a := 2 R 2 +i;t /M 2 , we have 

(2"(*^+fi)\ 2 »y^+ei) 
|S 2 (5 2 )I - 1 > < 2 ~ . 

M 2 I 

Since M 2 < 2" R 2, we have Pr(& 5 ) -> as « -> 00. 

• £g: (Decoder-1 fails to identify the correct codeword from the bin) In the bin H^so) whose size is not greater 
than (2 n(R 'i +ex) I M 2 + l), Given any sequence U1O1) e A*^, 3 s' 2 ^ s' 2 such that (ui(ii),U2(s 2 )) e A* r 

By arguments which are similar to those used in the analysis of £2, we have 

Pr((m(si),V 2 (s 2 )) e A* t ) < 2- mUi '' Ul) - ti \ 

where €4 — » as ej — > and « — > 00. By the union bound, 

Pr(8 6 ) < (\S 2 (s 2 )\- i)2-" (I(u ^-^ 
< 2 "( R 2+ e i- / ( f/ i; f/ 2)+f4)/ M2 ^ 

Recall that 

R' 2 =I(X 2 U 1 ;U 2 ) + e J + e 3 . 

Let 

Due to the fact that R 2 > I(X 2 ; U 2 \U\), we can simplify the bound to 

Pr(6 6 ) < 2"( / ( x2 - £/ ^i)-«2-fi) < 2-"", 

which goes to zero as n — > 00. 

• £7: (Frame-3 not jointly typical with previous messages) Given any sequences (xi,X2,Ui(si),U2(s 2 )) e A* , 
(xi,x 2 ,Ui(5i),U2(s 2 )>X 3 ) g A*. 

By arguments which are similar to those used in the analysis of £3, the Markov chain U 2 - X 2 —X3 implies that 
Pr{&i) — » as e\ — > and « — > 00. 
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• £§: (Encoder-3 fails to find a codeword) Given any sequences (Xi,X2,X3,Ui(.si), 112(4)) e A*> $ 4 such that 

(Xl,X 2 ,X 3 ,U 1 (5i),U2(4).X|(4)) 6 A fr 

By arguments which are similar to those used in the analysis of £>, we have 

Pr(£ 8 )<exp(-2"^- / ( x3f/2 ^ ) ^)), 
where 65 — > as ej — > and « — > 00. Let 

/? 3 := I(X 3 U 2 ;Xl) + £1 +e 5 . 

We have 

Pr(£ 8 )<exp(-2" e '), 

which goes to zero when n — > 00. 

• £9: (Encoder-3 's bin size too large) Given that 4 e 83(53), the cardinality of the bin satisfies 

2«(«3+ei ) 
1^)1 > + L 

By arguments which are similar to those used in the analysis of £5, we can argue that Pr(&q) — * as n — > 00. 

• £10: (Decoder-2 fails to identify the correct codeword from the bin) In the bin ©3(53) whose size is not greater 
than {l" {R 'i +ex) I Mi + l), given any sequences (Ui(si), 112(4)) 6 A* r 3 S' 3 + 4 sucn tnat («i( s i)» «2(4)> X|(^)) 6 ^e,- 

By arguments which are similar to those used in the analysis of £6, we have 

JV(6i ) < 2< I(X, ^ u2)+2t ' +c ^ /M 3 , 
where e& — > as e\ — > and n — > 00. Let 

Due to the fact that R 3 > I(X 3 ;X\\U 2 ), we have 

Pr(£ 10 )<2" e ', 

which goes to zero as n — > 00. 

• £11: (Reproduction of frame- 1 not jointly typical with other sequences) Given any sequences (x 3 ,x?,(4), ii(*i)> U2(4)) e 
A* and a correct decoding 4 = 4' ( x3 >x 3 >(4)' u i( ,s i)> u 2(4)'* 1 ) &A* er 

Although Xi depends on (ui(si), 112(4)) deterministically by the function gi, we can regard Px t \u x t/ 2 as a degraded 
probability distribution and use the Markov lemma and the trivial Markov chain (X 3 - U 2 - gi(U 2 ) to show 
that Pr(&[\) — » as ei — > and « — > 00. 

V Analysis of the distortions 

Consider the union of all the above events £ := (J-i, £,-. When the codebooks are randomly generated according 
to Subsection I, since Pr(&i) vanishes for i = 1, . . . , 11 as e\ — > 0, and n — > 00, Pr(S) also vanishes. Therefore 
there must exists a sequence of codebooks {(Ci,/,C 2 ,;, C3 i ;)}" 1 for which Pr(&) — > (the randomness comes from 
the generation of source sequences). We will focus on these codebooks in the following discussion. 

In the case that £ does not happen, all the sequences are jointly e\ -strong typical: (X 3 , Ui(si),U2(4)> Xi,X|(4)) 6 
A* , and the decoded indices are correct: 4 = 4' *3 = s y Since the expected distortion is a continuous function of 
the joint distribution, strong typicality implies distortion typicality. In other words, we have 

|£[</f(X ; -,XW] - E[dj(Xj,X j )]\ < e dj , j = 1,2,3, 
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where e dj -> as e x 0. Since Dj > E[dj{Xj,X% we have E[d in} (Xj,Xj)\8 c ] < D } + e dj ,j =1,2,3. 
In the case that & does happen, by the definition of d Jtmax , we have E[dy(Xj,X^)\G] < dj -mdx ,j = 1,2,3. 
Therefore the expected distortion for the y'-th frame is, 

E[d ( "\x h X j )] = E[df\X h X j )\S\Pr(S) 

J J J J 

+E[df ) (X j ,X J W](l -Pr(8)) 

J J 

< dj, max Pr(&) + E[dj(Xj,X j W] 

< dj, max Pr(8) + Dj + e dj 

When €\ — > 0, and n — » oo, for codebooks {(Ci i ;,C2,/,C3 > /)}^ 1 , Pr{S) and all the e variables vanish. Therefore 
Ve > 0, by driving the variables to their limits, we can always find e\ > for sufficiently large «, such that 

- log Mi - R { = e x + e 2 < e, 
n 

- log M2-R2 = 3ei + e 3 + e 4 < e, 
n 

- log M 3 -R3 = 3ej + e<; + e 6 < e, 
n 

£[^ n) (X ; , X')] - Dj < dj^Pr(S) + e di < e. 
Therefore (12. lb and ( 12.2b hold, which completes the proof. ■ 

Appendix III 
Theorem 3 converse proof 

/ Information equalities 

If a rate-distortion-tuple (R, D) = (R\, . . . ,Rj, D\,..., Dj) is admissible for the 3-stage C-NC system, then Ve > 0, 
there exists N(e), such that Vn > N{e) we have blocklength 11 encoders and decoders {/[ , / 2 , 4 . ^ > ^2 > 4 ' 
satisfying 

E[dj(Xj,X j )] < Dj + e, 

i\ogMj < Rj + e, j=l,...,T. 

Denote the messages sent by the three (T = 3) encoders respectively by Si,S2, and S3, and define the auxiliary 
random variables by Uj{i) := (S j,Xj(i-)),j =1,2. Due to the structure of the system we have the following Markov 
chains 

— Xi — S 1 , 
X3 — X" — — Xj, 
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which are readily verified. For the first coding rate, we have 

n(Ri + e) > H(Si) 

= //(so-HCSiixo 
= /(SuXi) 



= £/(Xi(0;Si,Xi(H) 

= £/(Xi(0;f/i(0) 



1=1 

Step (a) is because (Xi(i))" =1 are iid. The Markov chains - Xi(i) - U\(i) can be verified to hold for each 
i — 1, ... ,n. 

In the next stage, 

n(R 2 + e) > H(S 2 ) 

> fl(S 2 |Si) 

= tf(S 2 |Si)-/7(S2|Si,X 2 ) 

= /(SzjX 2 ^) 

= J]l(X 2 (i)-S 2 \U l {i),X 2 (i-)) 



n 



2/(X 2 (0;S2,x 2 (/-)|t/i(0) 

1=1 

- ^/(X 2 (/);C/ 2 (0lf/i(0) 



!=1 

Step (b) is because the Markov chain X z (i) - U\{i) - X 2 {i—) holds for each i. For each i, X\(i) is a deterministic 
function of S 2 , which is itself a deterministic function of U 2 (i). Therefore there exists a function gij such that 
Xi(i) = gi,i{U 2 {i)) for each i. The Markov chain X 3 (i) - (X 2 (i), Ui(i)) - U 2 (i) can also be verified to hold for each 
i — 1, ... ,n. 
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In the final stage, 



n(R 3 + e) > H(S 3 ) 

> H(S 3 \S 2 ) 

= H(S 3 \S 2 )- H(S 3 \S 2 ,X 3 ) 

= /(5 3 ;X 3 |5 2 ) 



(c) 



Ul) 
> 



^I(X\i);S 3 \U 2 (i),X 3 (i-)) 

i=\ 

n 

yi(X\i);S 3 ,X 3 (i-)\U 2 (i)) 

f=] 
n 

^/(X 3 (/);^(/)|C/ 2 (0) 



where step (c) is because the Markov chain X^{i) - U 2 (i) - X 3 (i—) holds for each i. Step (d) is because X^(i) is a 
deterministic function of {S\, 52,^3} c {S3, U\(i), 1/2(1)} for each i = 1, . . . ,«. 

Hence we have shown that for any admissible rate-distortion tuple (R, D), Ve > 0, 3 N(e) such that for all 

n > N(e), 



1 " 

Ri +e> -V /(Xi(0;£/i(0), 
n t—i 

i=i 

1 " 

R 2 + e > - y. I(X 2 (i); U 2 (i)\Um, 
n 

1 " — 
R 3 + e> -Y/(X 3 (0;^(0|C/ 2 (0), 



!=1 

D J + e>£[dJ ,) CX / ,X J )], 7 = l,...,r, 

X l (i) = gi,i(U 2 (f)),i =h-..,n 

and the Markov chains Xf (0 - Xi(0 - E/i(0 and X 3 (z) - (X 2 (i), E/ t (0) - ^2© hold for each ;'. Note that the Markov 
chains imply that 

u 

Y J I(U 1 (i);Xl(i)\xm = 0, (III.8) 

1=1 

n 

y /(I/a®; *3(0IX 2 (0, f/i(0) = 0. (III.9) 

!=1 
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// Time-sharing 

We introduce a timesharing random variable Q taking values in {!,...,«} equally likely, which is independent 



of all the other random variables. We have 

Ri +€ > 



1 " 

i " 

-y/(Xi(0;£/i(0IG = 
n 4-r 1 

i=i 

/(^i(0;f/i(0,0 



Similarly, we have 



R 2 + e>I(X 2 (Q);U 2 (Q)\U l (Q),Q), 

R, + e>I{X\Q);Xl{Q)\U 2 {Q),Q\ 

Dj + 6>E[dj(Xj(Q),X\Q))]. 

Now define U\ := (E/i(<2), Q), U 2 := U 2 (Q), Xj := Xj(Q), Xj := Xj(Q) for j = 1, . . . , T. Also define deterministic 
functions gi as follows, 

gl (U 2 ) = 8l (U 2 (Q), Q) := 8i, Q (U\Q)) = Z(Q) = X u 
which are consistent with the definitions of {U 2 ,X{\. Then we have the inequalities 

Ri + e > I{Xr, Ui), (111.10) 

R 2 + <e>I(X 2 ;U 2 \Ui), (III. 11) 

/?3 + e>/(X 3 ;^|t/ 2 ), (III. 12) 

Dj + e>E [dj(Xj, X-0], j = l,...,T. (III. 1 3) 

Concerning the Markov chains, note that 

/(C/ i; x||Zi) = im(Q),Q;Xl(QWi(Q)) 

+I(U l (Q);Xl(Q)\X 1 (Q),Q). 



The first term is zero because Q is designed to be independent of all other random variables. The second term is 
zero because of Equation (1111.8b . Hence the Markov chain U\ — X\ — X\ holds. Furthermore, we have 

HU 2 ;X 3 \X 2 , = I(U 2 (Q);X 3 (Q)\X 2 (Q), U,{Q\ Q) = 0, 

because of Equation ( 1III.9I ). Hence the Markov chain U 2 - (X 2 , Ui) — X3 holds. 
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/// Cardinality bounds on the alphabet of auxiliary random variables 

Till now we have shown that for any admissible rate-distortion tuple (R, D), Ve > 0, for sufficiently large n, 
inequalities dIII.10> -( |III.13> and the Markov chains U x -X x - X\ and U 2 - (X 2 , Uy) - X 3 hold. The definition of 
Uj(i) = (S j,Xj(l), . . . ,Xj(i - 1)) guarantees that Uj(i) has a finite alphabet, although its cardinality grows with «. 
Therefore U\ = (t/i(0, Q), Ui = Vi(Q) also have the finite alphabets H\ and I/2 whose cardinalities grow with 
n. In this section, we will use the Caratheodory theorem to find new random variables U\ and U" with smaller 
alphabets whose sizes are independent of n, such that inequalities (IIII. 1 Ofr - dlll. 1 3b and the Markov chains still hold 
even if {U\, Ui\ are replaced by {U\, UT}. 

Observe that we can define functionals {fx^xiex^ /rj, fd r j = 1,2,3 as follows. Note that they depend on u\, 
conditional distributions conditioned on U\ and the function g\. 



PXi(Xl) = ^ PUi(U\)pXi\Ui(Xl\Ui) 

=: ^p^iuOfx.iuuPzm^xieXu (01.14) 
I(X i; Ui) = H(Xi)- ^ pu.iuiWiXtWi =u{) 

='■ PUii^fRM^PxAU,), (111.15) 

uteVt 

I(X 2 ;U 2 \Ui) = Pu 1 (.Uiyi(X 2 \U 2 \Ui =m) 

-■ Pui( u i)fR 2 ( u u Px^U 2 \Ui\ (III. 16) 

/(X 3 ;^|Ui,y 2 ) = J] pu^KX'-XlW^uuUi) 

=: 2] PvMdfR&uPjPv&w^ (HI. 17) 

£[di(Zi,Zi)] = J] ^(MOEyiCZugiCMi.C/z))!^! =«i] 

= : ^ puMijfdM^Px^u^gi), (in. 18) 



XI /'f/i( M l) £ '[^2(^2,gl(Ml, C/ 2 ),X 2 |C/i = Mi] 

X Pu l (ui)fd2(uuPx 2 u 2 x 2 iu l '8i)' (111.19) 



£[£/ 3 (X 3 ,X 3 )] 



= J] />P I («l)£[</3(X 3 ,gl(Ul,t;2),^)|Ul = Ml] 

X Pt/iCMOACWl.PXjl/^lt/!.^!)- (111.20) 
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We try to find a new random variable U* to replace U\ such that all the quantities in the above equations need 
to be preserved. Because the Markov chains U\ — X\ — X\ and U2 - {X 2 , U\) - X3 hold, we can write the joint 
distribution as follows, 

PX>lflXl = PUiPXi\U i PX 1 X 3 \X 1 Pu 2 \X 2 UiPxl\X>lP- 

Fixing Px 1 \u 1 ,Px 1 x 3 \x 1 ,Pu z \X'u 1 ,PP 2 [x^iP and 8u the functionals {f Xl } Xl ex„ /«,, fd P j = 1,2,3 become functions de- 
pending solely on U\. 

Since pxi(xi) is a probability mass function which always adds up to 1, we only care about (|<Yi| - 1) out of 
equations (IIII. 14b . Suppose {^1,1, . . . .JCi.^i-i] c X\ are {\X\\ - 1) different elements of interest. Consider a set of 
k = \X\\ + 5 dimensional vectors consisting of \H\\ elements 

= ("1). ■ • - .Auxji-iCfi). //fi("i)»/« 2 («i). 

/« 3 ("1), fdi Ol), /d 2 ("i), /d 3 ("1 )))«, eW, • 

According to the above equations, the vector 

a = (px 1 (x u %---,Px 1 (xi,\x 1 \-i)' I (Xi;U 1 ),I(X 2 -,U 2 \U l ), 

I(X 3 ; X 3 \U 2 ),E[d l (X 1 ,X 1 )],E[d 2 (X 2 , X 2 )], E[d 3 (X 3 ,X 3 )]) 

is in the convex hull of set J[. By the Caratheodory theorem [23], there exist (k + 1) vectors in J[, such that a can 
be expressed by the convex combination of these vectors. Hence there exists 1A\ c 1/i satisfying \U\\ = k+ 1, and 
coefficients [a m } Ul€ tq satisfying Ja U] = 1 such that 

Px { (xi) = ^ a Ul f Xl (ui),Vxi eXi, 

u,eU' 

E\d 3 (X 3 ,X 3 )\ = J] 

Replacing C/i by a new random variable U\ on the alphabet with Pr(t/j" = mi) = a Ul , fixing the conditional 
distributions pxi\u; = Px l \Ui> Pu;\x 2 u\ = Pu 2 \x 2 u, 'Px y >\xHf-- = Px J \x 3 u 2 anc * tne f unct i° n <?i. we preserve the marginal 
distribution of Xi, all the mutual informations and expected distortions in equations ( IIII. 151 ) - ( IIII. 201 ). The progress 
is the new random variable U\ takes value in a smaller alphabet H\ whose size is independent of n. 
Note that because of the statistical structure of the joint distribution 

Px^ufxf = Pu^Px^Px.x^Pu^-uiPx^u 2 '' 

the Markov chains U\ - X\ - X 3 and U\ - (X 2 , U\) - X 3 hold. Because the marginal distribution px { is not changed, 
the joint distribution p x i also remains unchanged and consisting with the requirement of the problem. However, 
the distribution of U2 and X 3 is possibly changed. So we used U* 2 and X^ to indicate the corresponding random 
variables associated with U\. They still take values in alphabets Hi and X 3 . The function g\ is unchanged, which 
means that g\(u\,U2) - gi(u* v u1£) as long as (u\,U2) = (u* v uV). But the domain of g\ shrinks from U\ x I/2 to 
1/* x 042. 

Till now the alphabet U\ is reduced to H\ whose cardinality 

\<U\\ = k+ 1 = I-Y1I + 6 
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is independent of n, while all the rate and distortion constraints and the Markov chains still hold. Then we start to 
deal with the alphabet H 2 . 

Similar to the equations dHI.14| ) to ( IIII.20I ), we can define functionals f Xl x 2 u\ f° r a U (*i> *2> M *) e *i x *2 x < U\ 
and /«, . /«;, , /</; . fd> 2 , fd' 3 , such that 

p X 2 U .(xux 2 ,u l ) =: ^ Pu-Milfx^u-SuiiPx^uiw^, 
u 2 eu 2 

V(x u x z ,u\) eXi xX 2 xl/*, (HI.21) 



7(X 2 ;Wp=: £ P^(« 2 )/jr 2 ("2,^u;i^), 011-22) 

I(X 3 ;X;\U 2t ) =: J] ^W/^fefe^q^), (HI.23) 

£[A(Xi,XD] =: J] ^(^A^'^iW-ft)' ( m - 24 > 

E&feCXa,* 2 *)] =: J] p q ( M2 )/ 4 ( M2 ,^ 2 ^iq^i). ( IIL25 ) 

£[d 3 (X 3 ,F*)] =: J] Pi5(«2)/« ? («2,P x ,Di^.ft)- ( m - 26 ) 
Because the Markov chain U* - (X 2 , U* x ) - X 3 holds, the joint distribution can be written as follows, 

Px^ufxJ = Pv 2 P)Pu' 1 \u;Px } \x^ir 1 Pxf^ip'- 

Fixing px 2 i/j|£/*>Px 3 p^c/j>/'j?3»pr3£/2» and 8i> tne functionals become functions depending solely on u 2 . Then following 
the same method, we can replace U* 2 by U" and preserve the marginal distribution px x x^u\, the mutual informations 
and expected distortions in equations dlll.22| i-( rill.26l l. Because altogether l^ill^H^/*] + 4 quantities should be 
preserved, one can limit the cardinality of alphabet by 

|<Z/**I = 1*111*2111/11 + 5 = |*i| 2 |* 2 | + 6|#i||# 2 | + 5. 

In addition, by the statistical structure of the joint distribution, the Markov chain U" - (X 2 , U*) - X? can be 
verified to hold. Finally, because the values of {U\, U") never influence the performance of the system, we can 
relabel them by {1,2, . . . , |'Z/ 1 |} x {1, 2, . . . , YU"W such that their values do not depend on the original large size 
alphabets H\,1A 2 . We completely discard the old auxiliary random variables and rename the new random variables 
{U\, U"} by {Uu U 2 } to continue the proof. 

Up to now we showed for any admissible rate-distortion tuple (R, D), Ve > 0, for all n > N(e), we can find 
(U 2 ,^ ,gi) satisfying 

Ri + e > I(Xi; t/i), (111.27) 
R 2 + e>I(X 2 ;U 2 \U 1 ), (111.28) 
R 3 + e > I(X 3 ; X^U 2 ), (111.29) 
Dj + e > E [dj(Xj, X')], j = l,...,T, (III. 30) 

% =gi(U 2 ), 
|<Z/i| = |*il+6, 

l^2l = l*i| 2 |* 2 |+ 6\Xi\\X 2 \ +5, 
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and the Markov chains U\ -X\ —X\ and Ui - (X 2 , U\) - Xj hold, which implies 

I(Ui;Xl\X 1 ) = 0, (Tfl.31) 
I(U 2 ;X 3 \X 2 ,Ui) = 0. (Tfl.32) 

TV Taking limits 

Note that for each (e, n), |1/y| is finite and independent of (e, n) for j = 1,2. Therefore the conditional distribution 
^c/ 2 x 3 |x 3 e«(" 2 ' -^l* 3 ) * s a fi n i te dimensional stochastic matrix taking values in a compact set, and gi, e ,„ has only a 
finite number of possibilities. 

Let {e/}^! be any sequence of real numbers such that ei > and ei — > as / — > oo. Let {«/} be any sequence of 
blocklengths where W, «/ > N(ei). Since gi, e ,« takes values in a finite set, 3 g\ such that there exists a subsequence 
{e;,}™! such that for each e in this subsequence, gi, e , n = g\- 

Since Prp^^en ^ ves m a com P ac t se t, there exists again a subsequence of {prp-^pix'e, n, ' wn i cn converges to 
a limit p v ,z jo^. Denote the auxiliary random variables derived from the limit distribution by (t/* 2 ,^ 3 ). Due to 
the continuity of conditional mutual information and expectation with respect to probability distributions, ( IIII.27b - 
(MI. 32b become 

Ri >I(X i; Ul), 

R 2 >KX 2 ;U* 2 \U;), 

R 3 > /(X 3 ;X* 3 |f/* 2 ), 

Dj > E[dj(Xj,X*j)], 7 = 1,2,3, 

I(U{;Xl\Xi) = 0, 

/(t/2;X 3 |X 2 ,t/*) = 0, 

where X\ :- g\(U* 2 ). The last two equalities imply that the Markov chains U\ - X\ - X\ and U* 2 - (X 2 , U\) - X 3 
hold. Therefore (R, D) belongs to the right hand side of ( 15. lib . ■ 
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